Decoder Technology for Connectionist Large Vocabulary Speech Recognition
نویسندگان
چکیده
The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and the complexity imposed when long-span language models are used. This report describes an efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system [15]. The search algorithm is based on stack decoding and uses both likelihoodand posterior-based pruning. The use of the posterior-based phone deactivation pruning techniques is well-suited to hybrid connectionist/HMM systems because posterior phone probabilities are directly computed by the connectionist acoustic model. The single-pass decoder has been evaluate on the large vocabulary North American Business News task using a 20,000 word vocabulary and a trigram language model. These results indicate that phone deactivation pruning increased the search speed by an order of magnitude while incurring 2% or less relative search error. Using a pentium-based PC system, evaluation quality decoding (less than 3% relative search error) was available with execution speeds 2–5 times slower than realtime, and realtime decoding was available at the cost of 4–12% relative search error.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملRecent improvements to the ABBOT large vocabulary CSR system
ABBOT is the hybrid connectionist-hidden Markov model (HMM) large-vocabulary continuous speech recognition (CSR) system developed at Cambridge University. This system uses a recurrent network to estimate the acoustic observation probabilities within an HMM framework. A major advantage of this approach is that good performance is achieved using context-independent acoustic models and requiring m...
متن کاملStart-synchronous search for large vocabulary continuous speech recognition
In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a hybrid connectionist/HMM acoustic model) as a basis for phone deactivation pruning — a highly efficient method of reducing the required computation. The singl...
متن کاملTowards large vocabulary ASR on embedded platforms
In this paper we present an overview of an automatic speech recognition system implementation in the context of embedded systems. Specific challenges presented by low resource platforms will be addressed for the basic components of an ASR decoder. Our main objective is to utilize and modify the technology developed for large vocabulary ASR to achieve efficient LVCSR on embedded systems as well.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995